1 Prerequisites

The EASIIER package requires other packages to be installed. These include: ggplot2, VennDiagram, RColorBrewer, tibble, dplyr, stringr, rasterpdf, tidyverse, reshape, ggsignif, tools and meta all available in CRAN. The package also requires other packages from Bioconductor to perform annotations and enrichment : IlluminaHumanMethylation450kanno.ilmn12.hg19, IlluminaHumanMethylationEPICanno.ilm10b4.hg19, missMethyl, org.Hs.eg.db, GenomicRanges and rtracklayer.

To perform meta-analyses we use GWAMA, a software for meta analysis developed by the Intitute of Genomics from University of Tartu, GWAMA is available at https://genomics.ut.ee/en/tools/gwama-download and must be installed on the computer where we are going to run EASIER. The EASIER R package has been developed run under version R >= 3.6.3 .

2 Overview

The EASIER package performs downstream analyses of epigenenome-wide association studies (EWAS):

  • Module 1: Quality control of EWAS results
    • Input: EWAS results of each cohort and model
    • Configuration: input and output folders, array type, sample size, ethnicity, CpG exclusion criteria, etc.
    • Output, for each cohort and model:
      • QCed EWAS results
      • List of filtered CpGs and reason
      • Summary of QCed EWAS results
      • Plots of QCed EWAS results
    • Output, across cohorts and models:
      • Boxplot of beta values
      • Precision plot
      • Venn diagram of top CpGs
  • Module 2: Meta-analysis of EWAS results
    • Input: QCed EWAS results of each cohort and model
    • Configuration: input and output folders, array type, models, etc.
    • GWAMA (inverse-variance weighted meta-analysis)*
    • Output, for each model:
      • EWAS meta-analysis results (fixed and random)
      • Summary of EWAS meta-analysis results
      • Plots of EWAS meta-analysis results
  • Module 3: Functional enrichment of EWAS results
    • Input
      • EWAS meta-analysis results (CpG, p-value, beta)
      • List of CpGs
    • Output: tables and plots with enrichments
    • Gene-set enrichment (CpGs annotated to genes)
      • missMethyl (GO, KEGG, MSigDB)
      • ConsensusPath (several DB)
    • Molecular enrichment
      • Gene relative position
      • CpG island relative position
    • Specific for blood
    • Specific for placenta
      • ROADMAP fetal placenta chromatin states
      • Placenta prtially methylated domains (PMDs)
      • Placenta imprinted regions

In this vignette we will show how to apply EASIER R package with the EWAS results from three cohorts and two distinct models for each cohort.

3 Getting started

To install the required packages run the following commands in R:

# Install devtools
install.packages("devtools")

# Install required packages for brgeEnrich package
devtools::source_url("https://raw.githubusercontent.com/isglobal-brge/brgeEnrich/HEAD/installer.R")

# Install brgeEnrich R package
devtools::install_github("isglobal-brge/brgeEnrich@HEAD")

# Install required packages for EASIER package
devtools::source_url("https://raw.githubusercontent.com/isglobal-brge/EASIER/HEAD/installer.R")

# Install EASIER R package
devtools::install_github("isglobal-brge/EASIER@HEAD")

The first command installs a library called devtools that is needed to source online scripts and run them. The next line sources a file that installs all the required packages from R-CRAN and Bioconductor to run EASIER R package. Lastly, the EASIER R package and the corresponding vignettes are installed.

After package is installed, we can load it with command:

To access to vignette documents you only have to use vignette("EASIER")

4 Quality control

4.1 Quality Control Flowchart

\label{fig:qcworkflow}Quality control flowchart. This flowchart is used to coduct the quality control of EWAS results (QualityControl.R)

Figure 1: Quality control flowchart
This flowchart is used to coduct the quality control of EWAS results (QualityControl.R)

We have created a script named QualityControl.R to carry out the quality control of EWAS results automatically, as indicated in the workflow in Figure1. This script needs some initial editing to indicate information specific to the study. Briefly, after reading the EWAS results file of each cohort and model (study), it starts by detecting duplicated CpGs and CpGs with missing data (NAs), and generates the initial descriptive for Beta, SE and P-value. Then CpGs that meet exclusion criteria (indicated by the user) are excluded from the file, and adjusted p-values and lambda are calculated. A new file with QCed EWAS results for each cohort and model (each study) is created, together with the QCed descriptive for Beta, SE, p-value, and adjusted P-values. Moreover, several plots by cohort and model are generated (distribution plots for SE and P-value, QQ plot, and Volcano plot). Finally, plots comparing QCed EWAS results from different cohorts and models are created (Beta’s boxplot and precision plot).

4.2 Initial definition of variables specific to the study

We need to define certain parameters specific to each study in order to run Module 1 - Quality Control of EWAS results. (this is the only part of the code that requires editing!).

4.2.1 Input data

We need to indicate where the files with the EWAS results of each cohort and model (each study) are stored by defining a character vector called ‘files.’ The example below involves three different cohorts and two distinct models for each cohort:

files <- c('data/PROJ1_Cohort3_Model1_date_v2.txt',
           'data/PROJ1_Cohort3_Model2_date_v2.txt',
           'data/PROJ1_Cohort2_Plate_ModelA1_20170309.txt',
           'data/PROJ1_Cohort2_Plate_ModelA2_20170309.txt',
           'data/Cohort1_Model1_20170713.txt',
           'data/Cohort1_Model2_20170713.txt')

files must contain at least the following variables :

probeID BETA SE P_VAL
cg13869341 0.00143514362777834 0.00963411132344512 0.678945643213567
cg24669183 -0.0215342789035512 0.0150948404044624 0.013452341234512
cg15560884 0.00156725345562218 0.0063878467810596 0.845523221223523

4.2.2 Where to store output

We need to define the folder where we want to save the QCed EWAS results of each cohort and model, indicating it in a character vector called result_folder. In the example, the QCed files will be stored in a folder named QC_Results.

# Result folder
results_folder <- 'QC_Results'

4.2.3 Define abbreviations for input files

To avoid complex and long file names we can use an abbreviated/simplified name for each of the files used as input. For example, PROJ1_Cohort3_Model1_date_v2 will be treated as PROJ1_Cohort3_A1, or PROJ1_Cohort2_Plate_ModelA1_20170309 as PROJ1_Cohort2_A2. The length of the prefix vector must be equal to the number of files used as input:

# Prefixes for each file
prefixes <- c('PROJ1_Cohort3_A1', 'PROJ1_Cohort3_A2',
              'PROJ1_Cohort2_A1','PROJ1_Cohort2_A2', 
              'Cohort1_A1', 'Cohort1_A2')

4.2.4 Illumina Array type and filter conditions

We need to indicate the type of Illumina array used to measure DNA methylation in each of the EWAS results files that we are QCing. Two different array types are possible: 450K or EPIC. They should be listed following the same order as in the character vector file or prefixes (see example). Filtering of CpGs will depend on the Illumina array, thus this field has to be completed.

# Array type, used : EPIC or 450K
artype <- c('450K', 'EPIC', '450K', 'EPIC', '450K', 'EPIC')

Module 1 - Quality Control of EWAS results, allows the exclusion of CpG probes that do not accomplish certain criteria. The criteria to designate “problematic” CpG probes are based on (Zhou, Laird, and Shen 2017), (Solomon et al. 2018), (Fernandez-Jimenez et al. 2019), and can be defined by the user in a character vector as indicated in the example below:

4.2.4.1 Perform CpG exclusions –> control and non-CpG probes, and probes targeting CpGs in sexual chromosomes:

  • Control probes (“control_probes”): Technical control probes that do not correspond to CpGs, such as bisulfite conversion I, bisulfite conversion II, extension, hybridization and negative.
  • Non-cpg probes (“noncpg_probes”): ”): Probes targeting non-CpG sites classified as “ch.”
  • Sex chromosomes (“Sex”): Probes targeting CpGs in sex chromosomes .

4.2.4.2 Perform CpG exclusions –> probes with hybridizing problems

  • Poor mapping probes (“MASK_mapping”): Probes that have poor quality mapping to the target genomic location as indicated in the array’s manifest file based on genome build GRCh37 and GRCh38 (for example due to the presence of INDELs (Insertion–deletion mutations variants present in the genome);
  • Cross-hybridising probes (“MASK_sub30”):”): The sequence of the last 30 bp at the 3’ end of the probe is non-unique (problematic because the methylation level of such probes is more likely to represent a combination of multiple sites and not the level of the initially targeted CpG site). (Zhou, Laird, and Shen 2017) recommend 30bp, but we also provide the option to considere other 3’ lenghts: 25 bp, 35 bp, 40 bp or 45 bp (“MASK_sub25,” “MASK_sub40,” “MASK_sub45”).

4.2.4.3 Perform CpG exclusions -> probes affected by the presence of SNPs

  • “MASK_extBase”: Probes with a SNP altering the CpG dinucleotide sequence context and hence the ability of target cytosines to be methylated (regardless of the MAF);
  • “MASK_typeINextBaseSwitch”: Probes with a SNP in the extension base that causes a colour channel switch from the official annotation (regardless of the MAF);
  • “MASK_snp5_GMAF1p”: whether 5bp 3’-subsequence (including extension for typeII) overlap with any of the SNPs with global MAF >1%.
  • “MASK_snp5_common”: whether 5bp 3’-subsequence (including extension for typeII) overlap with any of the common SNPs from dbSNP (global MAF can be under 1%).

4.2.4.4 Perform CpG exclusions -> probes giving inconsistent results between arrays

  • “Unrel_450_EPIC_blood”: These are CpG probes that give inconsistent methylation levels between the 450K and EPIC arrays in blood, suggesting that measurements are unreliable for at least one of these arrays. CpG probes based on (Solomon et al. 2018).
  • “Unrel_450_EPIC_pla_restrict” or “Unrel_450_EPIC_pla”: These are probes that give inconsistent methylation levels between the 450K and EPIC arrays in placenta, suggesting that measurements are unreliable for at least one of these arrays. CpG probes based on (Fernandez-Jimenez et al. 2019) .

In this example we filter CpGs that meet the following conditions: MASK_sub30_copy, MASK_extBase, MASK_mapping, MASK_typeINextBaseSwitch, control_probes, Unrel_450_EPIC_blood and Sex. This is the recommended configuration for studies conducted in blood from European ancestry individuals using the 450K array

# Parameters to exclude CpGs
exclude <-  c( 'MASK_sub30_copy', 
               'MASK_extBase', 
               'MASK_mapping', 
               'MASK_typeINextBaseSwitch', 
               'control_probes', 
               'MASK_snp5_common',
               'Unrel_450_EPIC_blood', 
               'Sex')

4.2.4.5 Ethnicity

We need to define the ethnic origin of the study population for each of the EWAS results files used as input in Module 1 - Quality Control module. . Ethnic origin can be indicated according to the code from the table below. If the population is very diverse then select GMAF1p.

Population Code Population Description Super Population Code
AFR African AFR
AMR Ad Mixed American AMR
EAS East Asian EAS
EUR European EUR
SAS South Asian SAS
CHBB Han Chinese in Beijing, China EAS
JPT Japanese in Tokyo, Japan EAS
CHS Southern Han Chinese EAS
CDX Chinese Dai in Xishuangbanna, China EAS
KHV Kinh in Ho Chi Minh City, Vietnam EAS
CEU Utah Residents (CEPH) with Northern and Western European Ancestry EUR
TSI Toscani in Italia EUR
FIN Finnish in Finland EUR
GBR British in England and Scotland EUR
IBS Iberian Population in Spain EUR
YRI Yoruba in Ibadan, Nigeria AFR
LWK Luhya in Webuye, Kenya AFR
GWD Gambian in Western Divisions in the Gambia AFR
MSL Mende in Sierra Leone AFR
ESN Esan in Nigeria AFR
ASW Americans of African Ancestry in SW USA AFR
ACBB African Caribbeans in Barbados AFR
MXL Mexican Ancestry from Los Angeles USA AMR
PUR Puerto Ricans from Puerto Rico AMR
CLM Colombians from Medellin, Colombia AMR
PEL Peruvians from Lima, Peru AMR
GIH Gujarati Indian from Houston, Texas SAS
PJL Punjabi from Lahore, Pakistan SAS
BEBB Bengali from Bangladesh SAS
STU Sri Lankan Tamil from the UK SAS
ITU Indian Telugu from the UK SAS
GMAF1p If population is very diverse
ethnic <- c('EUR', 'EUR', 'SAS', 'SAS', 'EUR', 'EUR')

4.2.5 Other variables :

To create a precision plot and also to conduct the EWAS meta-analysis we need to indicate the number of samples included in each of the EWAS results files. We indicate this information in a vector called N. In addition, for case-control EWAS, we need to know the sample size of exposed or diseased individuals in each of the EWAS results files. This information is indicated in a vector called n.

N <- c(100, 100, 166, 166, 240, 240 )
n <- c(NA)

4.3 Overview of the QualityControl.R code

Note: all this part of the code can be run without any editing from the researcher! We present it in case some researchers want to modify or to adapt it to their needs.

This code will be executed for each EWAS results file defined in the variable files. . However, in this example we only show the analysis workflow for one study (one cohort and model). The complete code can be found in QualityControl.R .

# Variable declaration to perform precision plot
medianSE <- numeric(length(files))
value_N <- numeric(length(files))
cohort_label <- character(length(files))

# Prepare output folder for results (create if not exists)
if(!dir.exists(file.path(getwd(), results_folder )))
   suppressWarnings(dir.create(file.path(getwd(), results_folder)))


# IMPORTANT FOR A REAL ANALYSIS :

# To show the execution flow we perform the analysis with only one data
# file. Normally, we have more than one data file to analyze, for that
# reason, we execute the code inside a loop and we follow the execution
# flow for each file defined in `files` 
# So we need to uncomment the for instruction and remove i <- 1 assignment.

# for ( i in 1:length(files) )
# {

   # we force i <- 1 to execute the analysis only for the first variable
   # for real data we have to remove this line
   i <- 1
   
   # Prepare output subfolder for cohort-model results (create if not exists)
   if(!dir.exists(file.path(getwd(), results_folder, prefixes[i] )))
      suppressWarnings(dir.create(file.path(getwd(), results_folder, prefixes[i])))

   # Creates an empty file to resume all data if an old file exist  removes
   # the file and creates a new one
   fResumeName <- paste0( file.path(getwd(), results_folder, prefixes[i]),"/",prefixes[i], "_descriptives.txt")
   if ( file.exists(fResumeName) ) {
      file.remove(fResumeName)
   }
#> [1] TRUE
   file.create(fResumeName)
#> [1] TRUE
   # Read data.
   cohort <- read.table(files[i], header = TRUE, as.is = TRUE)
   print(paste0("Cohort file : ",files[i]," - readed OK", sep = " "))
#> [1] "Cohort file : data/PROJ1_Cohort3_Model1_date_v2.txt - readed OK "

First, the code reads the content of the file with EWAS results,

# Read data.
cohort <- read.table(files[i], header = TRUE, as.is = TRUE)
print(paste0("Cohort file : ",files[i]," - readed OK", sep = " "))
#> [1] "Cohort file : data/PROJ1_Cohort3_Model1_date_v2.txt - readed OK "

Then, it stores the content in a variable named cohort.

After that, the code executes the steps fo the quality control (Figure1). These are the functions that are used:

  1. clean_NA_from_data function: This function remove the rows (CpGs) with NA in any of the variables and return only complete cases.
# Remove rows with NA from data
cohort <- clean_NA_from_data(cohort)
  1. descriptives_CpGs function: This function performs a descriptive analysis of main variables ( Beta, SE and P-value) in the EWAS results file (QC_Results/PROJ1_Cohort3_A1_descriptives.tx).
# Descriptives - Before CpGs deletion #
descriptives_CpGs(cohort, c("BETA", "SE", "P_VAL"), fResumeName, before = TRUE)
  1. test_duplicate_CpGs function: This function tests whether there are any duplicated CpGs in the EWAS results file. If there are, the script execution stops and writes the duplicates and descriptive related to these duplicates in a file.
# Remove duplicates
test_duplicate_CpGs(cohort, "probeID", paste0(results_folder,'/',prefixes[i],'_duplicates.txt') )
  1. filterLowRepresentedCpGsinCohort function: This function removes low represented CpGs in a given cohort and model. Low represented CpGs are defined as those CpGs not assessed in a given % of the subjects of the study as indicated in the pcMissingSamples (by default 0.9) . This data is reported in descriptive resume. To apply this filtering, the EWAS results file has to have a column named as defined in colname_NforProbe parameter, (‘N_for_probe’). This data is updated in descriptive resume.
# Remove cpGs with low representation
   # first, we test if colname_NforProbe and pcMissingSampes are defined
   if( !exists("colname_NforProbe") ) { colname_NforProbe <- NULL }
   if( !exists("pcMissingSamples") ) { pcMissingSamples <- NULL }

# Remove cpGs with low representation
cohort <- filterLowRepresentedCpGsinCohort(cohort, 
                                           colname_NforProbe, 
                                           pcMissingSamples, N[i], 
                                           fileresume = fResumeName )
  1. exclude_CpGs function: This function excludes the list of CpGs that the user has selected (section 4.2.4). The function uses the parameters defined before in the exclude variable ,which are the data, cohort, the CpG id field (can be the column number or field name “probeId”), ”), the filters to apply, and, optionally, a file name where to save excluded CpGs and the exclusion reason (in this case the file name would be QC_Results/PROJ1_Cohort3_A1_excluded.txt).

 

# Exclude CpGs not meet conditions
   cohort <- exclude_CpGs(cohort, "probeID", 
                          exclude, 
                          ethnic = ethnic[i], 
                          filename = paste0(results_folder, '/',prefixes[i], '/',prefixes[i],'_excluded.txt'), 
                          fileresume = fResumeName, artype = artype[i] )

 

  1. descriptives_CpGs function (bis): After eliminating CpGs, we proceed to carry out another descriptive analysis with the same function as above.At the end of each iteration we get a file named xxx with descriptive before and after quality control, the number of excluded CpGs, and the number of statistically significant CpGs at different p-value theresholds: nominal and after multiple testing correction with the False Discovery Rate (FDR) and Bonferroni methods.

 

# Descriptives - After CpGs deletion #
 descriptives_CpGs(cohort, c("BETA", "SE", "P_VAL"), 
                   fResumeName, before = FALSE )

 

  1. adjust_data function: This function obtains the adjusted p-values after multiple testing correction applying the Bonferroni and FDR methods. We need to indicate in which column the p-value is found and what adjustment we want. By default, the function adjusts data by Bonferroni (bn) and FDR (fdr).

This function returns the input data adding two new columns corresponding to adjusted Bonferroni and FDR p-values. This function returns the input data adding two new columns corresponding to these adjustments. As in other functions seen before, optionally, we can get a data summary with the number of statistically significant p-values by Bonferroni or FDR, in a text file (the generated file in the example is called QC_Results/PROJ1_Cohort3_A1_ResumeSignificatives.txt ).

 

# data before adjustment
head(cohort)
#>      probeID         BETA          SE     P_VAL CpG_chrm CpG_beg CpG_end
#> 1 cg00002593 -0.001417333 0.010439809 0.8920091     chr1 1333412 1333414
#> 3 cg00014118 -0.006369144 0.016149771 0.6933006     chr1 2004121 2004123
#> 4 cg00040588  0.001088620 0.013553046 0.9359805     chr1 1355331 1355333
#> 5 cg00060374 -0.017876817 0.030803617 0.5616800     chr1 1419854 1419856
#> 6 cg00078456 -0.010499699 0.008940391 0.2402302     chr1 1629041 1629043
#> 7 cg00082285  0.002490867 0.009242993 0.7875549     chr1 1161889 1161891
#>   MASK_snp5_EUR probeType Unrel_450_EPIC_blood MASK_mapping
#> 1         FALSE        cg                FALSE        FALSE
#> 3         FALSE        cg                FALSE        FALSE
#> 4         FALSE        cg                FALSE        FALSE
#> 5         FALSE        cg                FALSE        FALSE
#> 6         FALSE        cg                FALSE        FALSE
#> 7         FALSE        cg                FALSE        FALSE
#>   MASK_typeINextBaseSwitch MASK_rmsk15 MASK_sub40_copy MASK_sub35_copy
#> 1                    FALSE       FALSE           FALSE           FALSE
#> 3                    FALSE        TRUE           FALSE           FALSE
#> 4                    FALSE       FALSE           FALSE           FALSE
#> 5                    FALSE       FALSE           FALSE           FALSE
#> 6                    FALSE       FALSE           FALSE           FALSE
#> 7                    FALSE       FALSE           FALSE           FALSE
#>   MASK_sub30_copy MASK_sub25_copy MASK_snp5_common MASK_snp5_GMAF1p
#> 1           FALSE           FALSE            FALSE            FALSE
#> 3           FALSE           FALSE            FALSE            FALSE
#> 4           FALSE           FALSE            FALSE            FALSE
#> 5           FALSE           FALSE            FALSE            FALSE
#> 6           FALSE           FALSE            FALSE            FALSE
#> 7           FALSE           FALSE            FALSE            FALSE
#>   MASK_extBase MASK_general Unrel_450_EPIC_pla_restrict Unrel_450_EPIC_pla
#> 1        FALSE        FALSE                       FALSE              FALSE
#> 3        FALSE        FALSE                       FALSE              FALSE
#> 4        FALSE        FALSE                       FALSE              FALSE
#> 5        FALSE        FALSE                       FALSE              FALSE
#> 6        FALSE        FALSE                       FALSE              FALSE
#> 7        FALSE        FALSE                       FALSE              FALSE
# Adjust data by Bonferroni and FDR
# Adjust data by Bonferroni and FDR
   cohort <- adjust_data(cohort, "P_VAL", 
                         bn=TRUE, 
                         fdr=TRUE, fResumeName, N[i] )

# data after adjustment
head(cohort)
#>        probeID         BETA          SE        P_VAL CpG_chrm CpG_beg CpG_end
#> 609 cg10983617 -0.069679611 0.019136276 0.0002713369     chr1 1043283 1043285
#> 409 cg07426077  0.017157678 0.004776728 0.0003282363     chr1 1553200 1553202
#> 181 cg03538326 -0.021234215 0.006503119 0.0010937309     chr1 1440464 1440466
#> 954 cg16679343 -0.021484492 0.006807270 0.0015988857     chr1 1117568 1117570
#> 606 cg10943191 -0.003590962 0.001237162 0.0037010125     chr1 1918974 1918976
#> 547 cg09932345  0.004092212 0.001453730 0.0048781063     chr1 1779121 1779123
#>     MASK_snp5_EUR probeType Unrel_450_EPIC_blood MASK_mapping
#> 609         FALSE        cg                FALSE        FALSE
#> 409         FALSE        cg                FALSE        FALSE
#> 181         FALSE        cg                FALSE        FALSE
#> 954         FALSE        cg                FALSE        FALSE
#> 606         FALSE        cg                FALSE        FALSE
#> 547         FALSE        cg                FALSE        FALSE
#>     MASK_typeINextBaseSwitch MASK_rmsk15 MASK_sub40_copy MASK_sub35_copy
#> 609                    FALSE       FALSE           FALSE           FALSE
#> 409                    FALSE       FALSE           FALSE           FALSE
#> 181                    FALSE       FALSE           FALSE           FALSE
#> 954                    FALSE       FALSE           FALSE           FALSE
#> 606                    FALSE       FALSE           FALSE           FALSE
#> 547                    FALSE       FALSE           FALSE           FALSE
#>     MASK_sub30_copy MASK_sub25_copy MASK_snp5_common MASK_snp5_GMAF1p
#> 609           FALSE           FALSE            FALSE            FALSE
#> 409           FALSE            TRUE            FALSE            FALSE
#> 181           FALSE           FALSE            FALSE            FALSE
#> 954           FALSE            TRUE            FALSE            FALSE
#> 606           FALSE           FALSE            FALSE            FALSE
#> 547           FALSE           FALSE            FALSE            FALSE
#>     MASK_extBase MASK_general Unrel_450_EPIC_pla_restrict Unrel_450_EPIC_pla
#> 609        FALSE        FALSE                       FALSE              FALSE
#> 409        FALSE        FALSE                       FALSE              FALSE
#> 181        FALSE        FALSE                       FALSE              FALSE
#> 954        FALSE        FALSE                       FALSE              FALSE
#> 606        FALSE        FALSE                       FALSE              FALSE
#> 547        FALSE        FALSE                       FALSE              FALSE
#>     padj.bonf  padj.fdr
#> 609        no 0.1971059
#> 409        no 0.1971059
#> 181        no 0.4378569
#> 954        no 0.4800654
#> 606        no 0.8422835
#> 547        no 0.8422835

 

  1. write_QCData function: TThis function annotates the QCed EWAS results using the the Illumina annotations for 450K or EPIC arrays (as indicated by the user) and saves it. The file generated by this function is the input for Module 2 - Meta-analysis of EWAS results. This data is stored with the *_QC_Data.txt* sufix.

 

# Write QC complete data to external file
write_QCData(cohort, paste0(results_folder, '/',prefixes[i], '/',prefixes[i]))

 

4.4 Quality Control - code for plots

 

The quality control process is summarized in different plots.

 

  1. plot_distribution function: It generates, for each cohort and model, several plots: a plot with the distribution of SEs, a plot with the distribritution of p-values, and a QQ plot of expected versus observerd p-values.

 

   ## Visualization - Plots

   # Distribution plot
   plot_distribution(cohort$SE, 
                     main = paste('Standard Errors of', prefixes[i]), 
                     xlab = 'SE')
\label{fig:QCcodedistrplotse}SE distribution plot

Figure 2: SE distribution plot

 

   ## Visualization - Plots

   plot_distribution(cohort$P_VAL, 
                     main = paste('p-values of', prefixes[i]), 
                     xlab = 'p-value')
\label{fig:QCcodedistrplotpval}p-value distribution plot

Figure 3: p-value distribution plot

 

   # QQ-plot.
   qqman::qq(cohort$P_VAL,
             main = sprintf('QQ-plot of %s (lambda = %f)', prefixes[i], 
                            lambda = get_lambda(cohort,"P_VAL")))
\label{fig:QCcodeqqplot}QQ-plots

Figure 4: QQ-plots

 

  1. plot_volcano function: It generates a Volcano plot for each cohort and model.

 

   # Volcano plot.
   plot_volcano(cohort, "BETA", "P_VAL", 
                main=paste('Volcano plot of', prefixes[i]) )
\label{fig:QCCodeVolcanoplot}Volcano Plot

Figure 5: Volcano Plot

 

  1. plot_precisionp function:It performs a precision plot to compare SE (precision) vs study sample size (or sample size of cases) across cohorts and models.

 

plot_precisionp(precplot.data.n, 
                paste(results_folder,  "precision_SE_N.png", sep='/'), 
                main = "Subgroup Precision Plot -  1/median(SE) vs sqrt(n)")

 

\label{fig:invchr17}Precision plot for 7 different datasets

Figure 6: Precision plot for 7 different datasets

 

  1. plot_betas_boxplot function: It performs a box plot of Betas across cohorts and models

 

plot_betas_boxplot(betas.data, 
                   paste(results_folder, 'BETAS_BoxPlot.pdf', sep="/"))

 

\label{fig:QCPlotBetasBox}Betas Boxplot plot for 10 different datasets

Figure 7: Betas Boxplot plot for 10 different datasets

 

  1. plot_venndiagram function: We need to define the venn diagram for a maximum of 5 datasets. Here we define which models and cohorts we want to be shown in the Venn diagram. In this example we define two different Venn diagrams, one with “Cohort1_A1,” “PROJ1_Cohort2_A1,” “PROJ1_Cohort2_B1,” “PROJ1_Cohort2_C1” and “PROJ1_Cohort3_A1” datasets and the other with five more datasets “Cohort1_A2,” “PROJ1_Cohort2_A2,” “PROJ1_Cohort2_B2,” “PROJ1_Cohort2_C2” and “P1_Cohort3_A2.” We can plot VennDiagram with adjusted data by Bonferroni or FDR.
# Initiall we had defined the venn_diagrams variable to configure each venn diagram 
# Venn diagrams
venn_diagrams <- list(
   c("Cohort1_A1", "PROJ1_Cohort2_A1", "PROJ1_Cohort2_B1", "PROJ1_Cohort2_C1", "PROJ1_Cohort3_A1" ),
   c("Cohort1_A2", "PROJ1_Cohort2_A2", "PROJ1_Cohort2_B2", "PROJ1_Cohort2_C2", "P1_Cohort3_A2" )
)

# Venn_Diagrams()
   for (i in 1:length(venn_diagrams))
      plot_venndiagram(venn_diagrams[[i]], qcpath = results_folder, plotpath = results_folder, bn='padj.bonf', fdr='padj.fdr')
\label{fig:QCPlotVennDiagram}Venn Diagram

Figure 8: Venn Diagram

5 Meta-Analysis with GWAMA

5.1 Meta-Analysis flowchart

To perform the meta-analyses, we use GWAMA, a Software tool for meta-analysis developed by the Institute of Genomics from University of Tartu. This software is available at https://genomics.ut.ee/en/tools/gwama-download.

Like Module 1 of the EASIER R package, we have created a script named MetaAnalysis.R to carry out the Module 2 - Meta-analysis of EWAS results (workflow in Figure Figure9). This script needs some minor editing to indicate information specific to the study. Briefly, it starts formatting the QCed EWAS results and preparing configuration files to run GWAMA. Then, it runs the inverse-variance weighted meta-analysis (fixed- and random-effects are tested), annotates the results and performs some quality control checks (lamda, QQ plot, heterogeneity (\(I^2\)), etc). Finally, it performs forest plots of the top CpGs and Venn diagrams to compare results between meta-analyses

\label{fig:metaworkflow}Meta-analysis flowchart. The script MetaAnalysis.R, contains the code to perform the steps indicated in the flowchart.The only editing required by the researcher is defining the initial variables, which are specific to each study. The rest of the script is automatic and does not need any editing.

Figure 9: Meta-analysis flowchart
The script MetaAnalysis.R, contains the code to perform the steps indicated in the flowchart.The only editing required by the researcher is defining the initial variables, which are specific to each study. The rest of the script is automatic and does not need any editing.

5.2 Initial definition of variables specific to the study

Like in Module 1, in Module 2 - Meta-analysis of EWAS results, we need to define some initial parameters, which are specific to each study. (this is the only part of the code that requires editing!). These variables are:

  1. artype: Type of array, either EPIC or 450K. If you are combining studies with data from the EPIC array and studies with data from the 450K array, remember to run Module 1 - Quality control of EWAS results selecting ‘450K’ if you only want to consider CpG probes from the 450K array, or selecting ‘EPIC’ if you want to consider all the CpG probes from EPIC even though for some cohorts you will have more than 50% missing’s.

  2. metafiles: List of files with QCed EWAS results that will be combined and analysed in each meta-analysis model. In the example below we have defined two different meta-analysis models, MetaA1 and MetaA2. In the first one, MetaA1, we will meta-analyse EWAS results from ‘PROJ1_Cohort3_A1,’ ‘PROJ1_Cohort2_A1’ and ‘Cohort1_A1.’

  3. pcentMissing: We can also exclude from the meta-analysis those CpGs not present in a % of the studies, by setting the minimum representation percentage. By default, we have set up the percentage to 80%. This means the code will execute the meta-analysis twice, one with all CpGs and another with only CpGs present in 80% of the studies. Note: If you are combining data from 450K and EPIC arrays and you selected ‘EPIC’ as the artype, with this setting you are able to choose running a meta-analysis in parallel only considering CpGs with high representation and therefore eliminating those CpGs only present in the EPIC array.

  4. results_folder: To indicate the folder where to save the output of the meta-analysis, we recommend to set it to “QC_Results.”

  5. venndiag_threshold: To indicate the p-value threshold to consider CpGs as statistically significant (by default p-value <0.05).

  6. venn_diagrams: To compare the overlap of significant CpGs between meta-analysis. P-value significance threshold is configured in venndiag_threshold variable, by default this value is 0.05. In the example we are comparing the results of meta-analysis 1 (MetaA1) and meta-analysis 2 (MetaA2). .

  7. gwama.dir: To define the GWAMA execution path.

 

## -- Variable definition for Meta-Analysis -- ##

# Array type, used : EPIC or 450K
artype <- '450K'

# Define data for each meta-analysis
metafiles <- list(
   'MetaA1' = c('Cohort1_A1','PROJ1_Cohort2_A1', 'PROJ1_Cohort3_A1' ),
   'MetaA2' = c('Cohort1_A2','PROJ1_Cohort2_A2', 'PROJ1_Cohort3_A2' ))

# Define maximum percent missing for each CpG
pcentMissing <- 0.8 # CpGs with precense lower than pcentMissing after EWAS
                    # meta-analysis will be deleted from the study.

# Paths with QCResults and path to store GWAMA results
results_folder <- 'QC_Results'
results_gwama <- '.'

# Venn diagrams ==> IMPORTANT : maximum 5 meta-analysis by venn diagram
venndiag_threshold <- 0.05

venn_diagrams <- list(
   c("MetaA1", "MetaA2" )
)

## -- End Variable definition for Meta-Analysis -- ##
# GWAMA binary path  (GWAMA IsGlobal Server sw05 and sw06 installation)
gwama.dir <- paste0(Sys.getenv("HOME"), 
                    "/data/EWAS_metaanalysis/1_QC_results_cohorts/GWAMA/")

5.2.1 Overview of the MetaAnalysis.R code

Note: all this part of the code can be run without any editing from the researcher! We present it in case some researchers want to modify it to adapt to their needs.

The meta-analysis is performed twice: first with all CpGs; and second with CpGs present in x% of the studies (as indicated in the pcentMissing variable). In this example, however, we only run the first meta-analysis but in a full run script both meta-analyses are performed.

First, the code creates the needed folders. In this example we create a GWAMA folder where we will put the input files for GWAMA, and a GWAMA_Results folder where we will store the results. When we finish the code execution the GWAMA folder with temporal configuration files will be removed.

 

## Create directory for GWAMA configuration files and GWAMA_Results 
## inside the defined results_gwama variable defined before.
if(!dir.exists(file.path(getwd(), 
                         paste(results_gwama, "GWAMA", sep="/") )))
   suppressWarnings(dir.create(file.path(getwd(), 
                                         paste(results_gwama, "GWAMA", sep="/"))))

## Create directory for GWAMA_Results
outputfolder <- paste0(results_gwama, "/GWAMA_Results")
if(!dir.exists(file.path(getwd(), outputfolder )))
   suppressWarnings(dir.create(file.path(getwd(), outputfolder)))

# We create a map file for GWAMA --> Used in Manhattan plots.
# We only need to indicate the array type
hapmapfile <- paste(results_gwama,"GWAMA", "hapmap.map" ,sep = "/")
generate_hapmap_file(artype, hapmapfile)

We also create some subfolders to store configuration files inside the GWAMA folder, one subfolder with configuration files for each meta-analysis model, for example, for MetaA1 meta-analysis we create the path GWAMA\MetaA1.

 

list.lowCpGs <- NULL

# Create folder for a meta-analysis in GWAMA folder, here we 
# store the GWAMA input files for each meta-analysis,
# We create one for complete meta-analysis
if(!dir.exists(file.path(getwd(), 
                         paste(results_gwama,"GWAMA", names(metafiles)[metf],
                               sep="/") )))
   suppressWarnings(dir.create(file.path(getwd(), 
                                         paste(results_gwama,"GWAMA", 
                                               names(metafiles)[metf], 
                                               sep="/"))))
# We create another for meta-analysis without filtered CpGs with low 
# percentage (sufix _Filtr)
if(!dir.exists(file.path(getwd(), 
                         paste0(results_gwama,"/GWAMA/", 
                                names(metafiles)[metf],
                                "_Filtr") )))
   suppressWarnings(dir.create(file.path(getwd(), 
                                         paste0(results_gwama, "/GWAMA/",
                                                names(metafiles)[metf],
                                                "_Filtr"))))

# GWAMA File name base
inputfolder <- paste0(results_gwama,"/GWAMA/",  names(metafiles)[metf])

modelfiles <- unlist(metafiles[metf])

# Execution with all CpGs and without filtered CpGs
runs <- c('Normal', 'lowcpgs') 
lowCpGs = FALSE;
outputfiles <- list()

outputgwama <- paste(outputfolder,names(metafiles)[metf],sep = '/')

Now we are ready to execute the meta-analysis of QCed EWAS results following the steps indicated in the workflow MetaAnalysis.R. These are the functions that will be used:

 

  1. create_GWAMA_files:

 

To modify EWAS QCed results to GWAMA format and create GWAMA configuration files. This function specifies

  • the GWAMA folder created before in the qcpath parameter
  • a character vector with all the meta-analysis models (previously defined in the metafiles variable)
  • the folder with the input data (these are the QC_Dataoutput files from the Module 1 - Quality Control of EWAS results),
  • the number of samples in the study,
  • we need to indicate if this is the execution with all CpGs or with a subset. If not, we indicate the list with excluded CpGs, which can be obtained with get_low_presence_CpGs function

 

 if(runs[j]=='lowcpgs') {
   lowCpGs = TRUE
   # Get low presence CpGs in order to exclude this from the new meta-analysis
   list.lowCpGs <- get_low_presence_CpGs(outputfiles[[j-1]], pcentMissing)
   inputfolder <- paste0(results_gwama,"/GWAMA/",  names(metafiles)[metf], "_Filtr")
   outputgwama <- paste0(outputgwama,"_Filtr")
 }

 # Create a GWAMA files for each file in meta-analysis and one file with 
 # gwama meta-analysis configuration
 for ( i in 1:length(modelfiles) )
   create_GWAMA_files(results_folder,  modelfiles[i], 
                      inputfolder, N[i], list.lowCpGs )

 

  1. run_GWAMA_MetaAnalysis

 

To run both fixed-effects and random-effects inverse-variance weighted meta-analsyis with GWAMA. This function needs

  • the folder with data to be analysed, (this is the GWAMA folder),
  • where to store the meta-analysis results (by default this function creates a subfolder with name of the meta-analysis). The following files are stored there: meta-analysis results, Manhattan plots and QQ plots, both for fixed-effects and random-effects,
  • the name of the meta-analysis model
  • where is the GWAMA binary installed

All these parameters have been specified in section 5.2 Initial definition of variables specific to the study.

 

 # Execute GWAMA meta-analysis and manhattan-plot, QQ-plot and a file 
 # with gwama results.
 outputfiles[[runs[j]]] <- run_GWAMA_MetaAnalysis(inputfolder, 
                                                 outputgwama, 
                                                 names(metafiles)[metf], 
                                                 gwama.dir)
\label{fig:manhat}Manhattan plot obtained with GWAMA

Figure 10: Manhattan plot obtained with GWAMA

 

  1. get_descriptives_postGWAMA:

To make a quality control check of the meta-analyses results. This function is similar to what was done in Module 1 - Quality control of EWAS results, but instead of being applied to the cohort specific EWAS results, it is applied to the meta-analysis EWAS results. This function does: - calcutes adjusted p-values (FDR and Bonferroni), - annotates CpGs, - generates a file with a summary of the meta-analysis results, - generates plots showing the distribution of the heterogeneity, SE, and p-values, - generates QQ-plots with lambda, Manhattan plots and Volcano-plots.

 

 # Post-metha-analysis QC --- >>> adds BN and FDR adjustment
 dataPost <- get_descriptives_postGWAMA(outputgwama, 
                                       outputfiles[[runs[j]]], 
                                       modelfiles, 
                                       names(metafiles)[metf], 
                                       artype, 
                                       N[which(prefixes %in% modelfiles)] )
\label{fig:heteroi2}Heterogeneity distribution plot (i2)

Figure 11: Heterogeneity distribution plot (i2)

 

  1. plot_ForestPlot:

 

To generate forest plots for the top 30 statistically significant CpGs of the fixed- and random-effects models,

 

 # Forest-Plot
 plot_ForestPlot( dataPost, metafiles[[metf]], runs[j], 
                 inputfolder, names(metafiles)[metf], files, outputgwama  )
\label{fig:Forestplot}Forest plot for cpg22718050

Figure 12: Forest plot for cpg22718050

 

  1. venn_diagrams:

 

To generate Venn diagrams comparing the overlap of statistcially significant CpGs between meta-analysis models. CpGs with FDR or BN corrected p-values can be selected for this comparison.

 

for (i in 1:length(venn_diagrams))
   plot_venndiagram(venn_diagrams[[i]], qcpath = outputfolder, 
                    plotpath =  paste0(results_gwama, "/GWAMA_Results"), 
                    pattern = '_Fixed_Modif.out',bn='Bonferroni', fdr='FDR',
                    venndiag_threshold)

 

this is venn diagram output example

\label{fig:vennplot} Venn diagram example

Figure 13: Venn diagram example

6 Enrichment

6.1 Enrichment Flowchart

Like the other EASIER modules, for Module 3 – Functional enrichment of EWAS results, we have created a script named Enrichment.R. This script needs some minor editing to indicate information specific to the study

 

The script follows the workflow indicated in Figure14. Briefly, the first step the one that needs editing by the user, is to define initial parameter specific to the study. Then we read the files with the list of CpGs for which we want to test for the enrichment. If the file only contains CpG names, we start annotating these CpGs to genes and to other molecular features. After that, the script performs gene-set enrichment (at the gene level), by applying missmethyl R package using the databases: GO terms, KEGG and Molecular Signatures DB (MSigDB) and enrichment with ConsensusPathDB using brgeEnrich package https://github.com/isglobal-brge/brgeEnrich. Then, it performs enrichment for gene and CpG island relative positions. In this case, the enrichment is performed for all CpGs as well as for hypo and hyper-methylated CpGs, given that coefficients of the association are provided in the list of CpGs. Finally it performs molecular enrichment specific for these tissues: blood, placenta or other. For blood, it tests the enrichment for blood chromatin states. For placenta, it test the enrichment for placenta chromatin states, for placenta partially methylated domains (PMDs) (Schroeder and LaSalle 2013) and for placenta imprinted regions (Hamada et al. 2016).

 

\label{fig:enrichworkflow}Functional enrichment flowchart. This flowchart is used in the script under test folder to perform the enrichment (Enrichment.R). The most important step in this workflow is the first step where we have to define the variables, if variables are well defined all the process is 'automatic'

Figure 14: Functional enrichment flowchart
This flowchart is used in the script under test folder to perform the enrichment (Enrichment.R). The most important step in this workflow is the first step where we have to define the variables, if variables are well defined all the process is ‘automatic’

 

6.2 Initial definition of variables specific to the study

 

Like in the other modules of the EASIER R package, in Module 3 - Functional Enrichment if EWAS results we need to define some initial variables, which are specific to the study (this is the only part of the code that has to be edited!).

These variables are:

 

1. Working directory:

 

First, we need to set up the working directory. By default, we set the working directory to the meta-analysis folder.

 

# Set working directory to metaanalysis folder
setwd("<path to metaanalysis folder>/metaanalysis")

 

2. Files to enrich:

 

To indicate directory and files with the CpGs to be used for the enrichment. We need to indicate the path to the files with the list of CpGs to be tested for the functional enrichment (FilesToEnrich variable). These files can contain, either a list of CpG names or the results from Module 2 - Meta-analysis of EWAS results including the associated p-values and Betas together with the CpG names

 

# Files with CpG data to enrich may be a CpGs list or annotated GWAMA output
FilesToEnrich <- c('toenrich/CpGstoEnrich.txt',
                   'GWAMA_Results/MetaA1/MetaA1_Fixed_Modif.out'
                    )

 

3. P-value threshold:

 

In the case of performing the functional enrichment using the results of meta-analysis results, we need to define the p-value threshold to select the CpGs to test for the enrichment. One of these options is possible:

  • BN: CpGs that accomplish with Bonferroni criteria, possible values : TRUE or FALSE
  • FDR: Indicate FDR adjusted p-value threshold to select the list of CpGs for which we want to perform the enrichment . If FDR = NA, FDR is not taken in to a ccount.
  • pvalue: Indicated nominal p-value threshold to select the list of CpGs for which we want to perform the enrichment. If p-value = NA, pvalue is not taken in to account.

 

# Values for adjustment
BN <-  TRUE       # Use Bonferroni ?
FDR <- 0.05        # significance level for adjustment, if NA FDR is not used
pvalue <- 0.05    # significance level for p-value, if NA p-value is not used

 

4. Array type:

 

artype, type of array: 450K or EPIC.

 

# Array type, used : EPIC or 450K
artype <- '450K'

 

5. Indicate folders for input and output data:

 

We have to indicate the folders with the CpGs to enrich (either the meta-analysis results from EASIER module 2 or a file with a list of CpG names), and the folder to store results fo the functional enrichment analyses.

 

# Result paths definition for QC, Meta-Analysis and Enrichment
results_folder <- 'QC_Results'
results_gwama <- '.'
results_enrich <- 'Enrichment'

 

6. Enrichment type:

 

To define the type of enrichment to be performed (enrichtype variabe). Two options are possible:

  • Blood: It performs the gene-set enrichment, the enrichment for gene and CpG island relative positions plus a functional enrichment specific for blood (ROADMAP chromatin states and cis-eQTMs).

  • Placenta: It performs the gene-set enrichment, the enrichment for gene and CpG island relative positions plus functional enrichments specific for placenta (ROADMAP fetal placenta chromatin states, placenta partially methylated domains (PMDs), and placenta imprinted regions). When performing this enrichment, we have to also define enrichFP18 = TRUE if we want to use fetal placenta 18 chromatin states, or enrichFP18 = FALSE if want to use fetal placenta 15 chromatin states. The difference between the two is the number of chromatin states considered (18 or 15).

 

# Enrichment type :  'BLOOD' or 'PLACENTA'
#     if enrichtype <- 'BLOOD' => enrichment with : 
#                          Cromatine States : BLOOD (crom15)
#                          (To be implemented in future) Partially Methylated Domains (PMD) for Blood
#     if enrichtype <- 'PLACENTA' => enrichment with: 
#                          Cromatine States : PLACENTA (FP_15) optionally (FP_18)
#                          Partially Methylated Domains (PMD) for Placenta
#     if enrichtype is different from 'BLOOD' and 'PLACENTA' 
#     we only get the missMethyl and MSigDB enrichment and the Unique genes list.
enrichtype <- 'PLACENTA'

# Cromatine States Placenta Enrichment FP_18
# if enrichFP18 = TRUE the enrichment is performed wit FP_15 and FP_18
enrichFP18 <- FALSE

 

7. Enrichment statistical test:

 

To indicate which statistical test we want to apply for the enrichment we define the variable testdata. It can be Fisher or Hypergeometric (by default: Fisher).

 

# Test to be used: 'Fisher' or 'Hypergeometric' for different values no test will be performed
testdata <- 'Fisher'

 

Then the code creates several folders where to store the results. No editing is needed in this part..

 

## Check if we have any files to enrich and if these files exists
if (length(FilesToEnrich)>=1 & FilesToEnrich[1]!='') {
   for ( i in 1:length(FilesToEnrich))
      if (!file.exists(FilesToEnrich[i])) {
         stop(paste0('File ',FilesToEnrich[i],' does not exsits, please check file' ))
      }
}

## Check variables

if( ! toupper(enrichtype) %in% c('PLACENTA','BLOOD') )
   warning('Only enrichment with MyssMethyl and MSigDB will be done')

if( ! tolower(testdata) %in% c('fisher','hypergeometric') )
   warning('Wrong value for testdata variable, values must be "Fisher" or "Hypergeometric". No test will be performed ')



# Convert relative paths to absolute paths for FilesToEnrich
FilesToEnrich <- unlist(sapply(FilesToEnrich, 
                               function(file) { if(substr(file,1,1)!='.' & substr(file,1,1)!='/') 
                                  file <- paste0('./',file) 
                               else 
                                  file }))
FilesToEnrich <- sapply(FilesToEnrich, file_path_as_absolute)

if(results_enrich!='.'){
   outputfolder <- file.path(getwd(), results_enrich )
}else{
   outputfolder <- file.path(getwd() )}


# Create dir to put results from enrichment
if(!dir.exists(outputfolder))
   suppressWarnings(dir.create(outputfolder))

setwd( outputfolder)

 

6.3 Overview of the Enrichment.R code

Note: This part of the code can be run without any editing from the researcher! We present it in case some researchers want to modify or adapt it to their needs.

 

\label{fig:enrichworkflowcommon}Functional enrichment flowchart. Detailed common enrichment for blood, placenta and other

Figure 15: Functional enrichment flowchart
Detailed common enrichment for blood, placenta and other

 

6.3.1 Read list of CpGs for enrichment

 

The procedure that we will detail here is executed for each one of the files entered in the variable FilesToEnrich. We show how it works with a list of CpG names (first file in FilesToEnrich variable) and with meta-analysis output results (second file in FilesToEnrich variable).

 

i <- 1 # We get first file in FilesToEnrich

# Enrich all CpGs
allCpGs <- FALSE

# Get data
data <- NULL
data <- read.table(FilesToEnrich[i], header = TRUE, 
                   sep = "", dec = ".", stringsAsFactors = FALSE)

 

6.3.2 Annotate CpGs (if working with lists of CpG names)

After defining the variables specific to each study, we read the file content and test if data is a list of CpG names or an output file from Module 2 – Meta-analysis of EWAS results. If the input file is a list of CpG names we first annotate it with the get_annotations function (meta-analysis results files are already annotated). In get_annotattions we need to define the artype (array type) parameter , and the data to be annotated.

 

if(dim(data)[1] <= 1 | dim(data)[2] <= 1) {
   data <- read.table(FilesToEnrich[i], dec = ".") # Avoid header
   data <- as.vector(t(data))
   head(data)
   data <- get_annotattions(data, artype, FilesToEnrich[i], outputfolder )
   
   head(data)
   
   allCpGs <- TRUE
   data$chromosome <- substr(data$chr,4,length(data$chr))
   data$rs_number <- data$CpGs
}

 

6.3.3 Get list of unique genes

 

The getUniqueGenes function retrieves the list of unique genes annotated to the significant CpGs and stores the result in geneUniv variable. This list of genes will be used to perform the gene-set enrichment analyses.

 

      # get unique genes from data
geneUniv <- lapply( lapply(miss_enrich[grepl("signif", names(miss_enrich))], 
                           function(cpgs) { 
                              data[which(as.character(data$CpGs) %in% cpgs),]$UCSC_RefGene_Name
                              }),
                    getUniqueGenes)

geneUniv

 

6.3.4 Gene-set enrichment: GO terms and KEGG

 

The gene-set enrichment analysis is conducted with the missMethyl_enrichment function (which is based on the missmethyl R package (Maksimovic, Gordon, and Oshlack 2012), (Phipson and Oshlack 2014), (Maksimovic et al. 2015), (Phipson, Maksimovic, and Oshlack 2015)). missmethyl performs the enrichment by taking into account the number of probes per gene on the array, as well as taking into account multi-gene associated probes. It uses two reference databases:

  • Gene Ontology (GO) terms
  • Kyoto Encyclopedia of Genes and Genomes (KEGG)

If the list of CpGs does not contain p-values, all the CpGs will be used for the enrichment. If the list of CpGs contains p-values, we will use those passing a specific p-value threshold (indicated in the section 6.2).

 

## -- Functional Enrichmnet
## ------------------------

# Enrichment with missMethyl - GO and KEGG --> Writes results to outputfolder
miss_enrich <- missMethyl_enrichment(data, outputfolder, FilesToEnrich[i], 
                                     artype, BN, FDR, pvalue, allCpGs, plots = TRUE )

head(miss_enrich$GO)
head(miss_enrich$KEGG)

 

6.3.5 Gene-set enrichment: Molecular Signatures Database (MSigDB)

 

MSigDB_enrichment function conducts a functional enrichment using missmethyl R package and the public database Molecular Signatures Database (MSigDB). The same parameters as above have to be indicated

 

## -- Molecular Enrichmnet
## -----------------------

# Molecular Signatures Database enrichment
msd_enrich <- MSigDB_enrichment(data, outputfolder, FilesToEnrich[i], artype, BN, FDR, pvalue, allCpGs)

head(msd_enrich$MSigDB)

 

6.3.6 Gene-set enrichment: ConsensusPathDB

 

get_consensusPdb_OverRepresentation function conducts a functional enrichment using brgEnrich R package and the public database ConsensusPathDB Database (http://consensuspathdb.org). We can get enrichment for different FileSet types :

 

  • P manually curated pathways from pathway databases
  • G2 Gene Ontology-based sets, GO level 2
  • G3 Gene Ontology-based sets, GO level 3
  • G4 Gene Ontology-based sets, GO level 4
  • G5 Gene Ontology-based sets, GO level 5

 

## -- Online Tools

# Enrichment with ConsensusPathDB
#     - Consensus path http://consensuspathdb.org/ 
#     (gene-set analysis – over-representation analysis)

# Available FSet types :
# 1 P     manually curated pathways from pathway databases
# 3 G2    Gene Ontology-based sets, GO level 2
# 4 G3    Gene Ontology-based sets, GO level 3
# 5 G4    Gene Ontology-based sets, GO level 4
# 6 G5    Gene Ontology-based sets, GO level 5

acFSet <- c('P', 'G2', 'G3')
acType <- 'entrez-gene'

# Get Enrichment
CPDB_enrich <- lapply(names(geneUniv), function( data, accFSet, genes ) {
   print(data)
   lapply(accFSet,
          get_consensusPdb_OverRepresentation,
          entityType='genes',
          accNumbers=na.omit(as.character(eval(parse(text = paste0("genes$",data))))),
          accType=acType,
          outputdir = "ConsensusPathDB",
          outputfile = gsub(".", "_", data, fixed=TRUE) )},
   accFSet = acFSet, genes = geneUniv)

names(CPDB_enrich) <- names(geneUniv)

6.3.7 Enrichment for gene relative positions

 

In this section we will test the enrichment of statistically significant CpGs for gene relative positions (annotated with the Illumina annotation as: 1st Exon, 3’UTR, 5’UTR, Gene Body, TSS1500 (distal promoter), TSS200 (proximal promoter)).

Given that we have provided the coefficients of the association, we can do the enrichment for all statistically significant CpGs and also stratifying by hypo- and hyper-methylated CpGs. This is done with the getHyperHypo function which classifies CpGs in significant or not (for FDR, BN or the indicated nominal p-value threshold) and in hypo- and hyper-methylated.

 

if("FDR" %in% colnames(data) & "Bonferroni" %in% colnames(data))
{

   ## -- Prepare data
   ## ---------------

   # Add column bFDR to data for that CpGs that accomplish with FDR
    # Classify fdr into "yes" and no taking into account FDR significance level
   data$bFDR <- getBinaryClassificationYesNo(data$FDR, "<", FDR) 

   # Classify by Hyper and Hypo methylated
   data$meth_state <- getHyperHypo(data$beta) # Classify methylation into Hyper and Hypo

   # CpGs FDR and Hyper and Hypo respectively
   FDR_Hyper <- ifelse(data$bFDR == 'yes' & 
                          data$meth_state=='Hyper', "yes", "no")
   FDR_Hypo <- ifelse(data$bFDR == 'yes' & 
                         data$meth_state=='Hypo', "yes", "no")

   # CpGs Bonferroni and Hyper and Hypo respectively
   BN_Hyper <- ifelse(data$Bonferroni == 'yes' & 
                         data$meth_state=='Hyper', "yes", "no")
   BN_Hypo <- ifelse(data$Bonferroni == 'yes' & 
                        data$meth_state=='Hypo', "yes", "no")

 

Then, we test the enrichment of the CpGs for gene relative positions with the getAllFisherTest (Fisher test) or the getAllHypergeometricTest (Hypergeometric test) functions. These functions write the results in the outputfile inside the previously indicated folder outputdir.

 

## --  CpG Gene position
## ---------------------

# Get descriptives
get_descriptives_GenePosition(data$UCSC_RefGene_Group, 
                              data$Bonferroni, 
                              "Bonferroni", 
                              outputdir = "GenePosition/Fisher_BN_Desc",
                              outputfile = FilesToEnrich[i])
get_descriptives_GenePosition(data$UCSC_RefGene_Group, d
                              ata$bFDR , "FDR", 
                              outputdir = "GenePosition/Fisher_FDR_Desc", 
                              outputfile = FilesToEnrich[i])


if( tolower(testdata) =='fisher') {
   ## --  Fisher Test - Gene position - FDR, FDR_hyper and FDR_hypo
   GenePosition_fdr <- getAllFisherTest(data$bFDR, 
                                  data$UCSC_RefGene_Group, 
                                  outputdir = "GenePosition/Fisher_FDR", 
                                  outputfile = FilesToEnrich[i], 
                                  plots = TRUE )
   GenePosition_fdr_hyper <- getAllFisherTest(FDR_Hyper, 
                                  data$UCSC_RefGene_Group, 
                                  outputdir = "GenePosition/Fisher_FDRHyper", 
                                  outputfile = FilesToEnrich[i], 
                                  plots = TRUE )
   GenePosition_fdr_hypo <- getAllFisherTest(FDR_Hypo, 
                                    data$UCSC_RefGene_Group, 
                                    outputdir = "GenePosition/Fisher_FDRHypo", 
                                    outputfile = FilesToEnrich[i], plots = TRUE )
}
else if ( tolower(testdata) =='hypergeometric') {
   ## --  HyperGeometric Test - Island relative position - 
   ## FDR, FDR_hyper and FDR_hypo (for Depletion and Enrichment)
   GenePosition_fdr <- getAllHypergeometricTest(data$bFDR, 
                                    data$UCSC_RefGene_Group, 
                                    outputdir = "GenePosition/HyperG_FDR", 
                                    outputfile = FilesToEnrich[i])
   GenePosition_fdr_hyper <- getAllHypergeometricTest(FDR_Hyper,
                                    data$UCSC_RefGene_Group,
                                    outputdir = "GenePosition/HyperG_FDRHyper", 
                                    outputfile = FilesToEnrich[i])
   GenePosition_fdr_hypo <- getAllHypergeometricTest(FDR_Hypo, 
                                   data$UCSC_RefGene_Group,
                                   outputdir = "GenePosition/HyperG_FDRHypo", 
                                   outputfile = FilesToEnrich[i])
}

 

Besides getting the results in a table, we also can get a plot with the odds ratios (ORs) of enrichment with the plot_TestResults_Collapsed function.

 

         plot_TestResults_Collapsed(list(fdr = GenePosition_fdr, 
                                         fdr_hypo = GenePosition_fdr_hypo, 
                                         fdr_hyper = GenePosition_fdr_hyper),
                                    outputdir = "GenePosition", 
                                    outputfile = FilesToEnrich[i], main = )

 

\label{fig:genepos}Gene position with Fisher test for Hyper and Hypo methylated CpGs

Figure 16: Gene position with Fisher test for Hyper and Hypo methylated CpGs

 

6.3.8 Enrichment for CpG island relative positions

 

Similar as above, and with the function get_descriptives_RelativetoIsland we can get an enrichment for CpG island relative positions (CpG island, N-shore, N-shelf, S-shore, S-shelf). The function gives a plot together with the corresponding statistical results in a table.

 

## --  CpG Island relative position
## --------------------------------

# Get descriptives
get_descriptives_RelativetoIsland(data$Relation_to_Island, 
                            data$Bonferroni, 
                            "Bonferroni", 
                            outputdir = "RelativeToIsland/Fisher_BN_RelativeToIsland", 
                            outputfile = FilesToEnrich[i])
get_descriptives_RelativetoIsland(data$Relation_to_Island, 
                            data$bFDR , 
                            "FDR", 
                            outputdir = "RelativeToIsland/Fisher_FDR_RelativeToIsland",
                            outputfile = FilesToEnrich[i])


if( tolower(testdata) =='fisher') {
   ## --  Fisher Test - Position Relative to Island - FDR, FDR_hyper and FDR_hypo
   relative_island_fdr <- getAllFisherTest(data$bFDR, 
                                  data$Relation_to_Island, 
                                  outputdir = "RelativeToIsland/Fisher_FDR", 
                                  outputfile = FilesToEnrich[i], plots = TRUE )
   relative_island_fdr_hyper <- getAllFisherTest(FDR_Hyper, 
                                  data$Relation_to_Island, 
                                  outputdir = "RelativeToIsland/Fisher_FDRHyper", 
                                  outputfile = FilesToEnrich[i], plots = TRUE )
   relative_island_fdr_hypo <- getAllFisherTest(FDR_Hypo,
                                       data$Relation_to_Island, 
                                       outputdir = "RelativeToIsland/Fisher_FDRHypo", 
                                       outputfile = FilesToEnrich[i], plots = TRUE )
}

 

\label{fig:genepos}Gene position with Fisher test for Hyper and Hypo methylated CpGs

Figure 17: Gene position with Fisher test for Hyper and Hypo methylated CpGs

 

6.3.9 Specific molecular enrichments for blood

 

As we show in Figure 15,there are some functional enrichment analyses specific for each tissue. For blood, the code tests the enrichment for ROADMAP blood 15 chromatin states reference. To this end, each CpG in the array is annotated to one or several chromatin states by taking a state as present in that locus if it is described in at least 1 of the 27 blood-related cell types present in ROADMAP.

 

\label{fig:enrichworkflowblood}Enrichment flowchart. Detailed Blood enrichment

Figure 18: Enrichment flowchart
Detailed Blood enrichment

 

Here, we also can select the Fisher or Hypergeometric test, and the enrichment is done for all CpGs and separately for hyper- and hypo-methylated CpGs (if Betas of the association are provided).

 

## --  ROADMAP  -  Metilation in Cromatine States - BLOOD
## -------------------------------------------------------
##  Analysis of methylation changes in the different chromatin 
##  states (CpGs are diff meth in some states and others don't)

# Prepare data
# Adds chromatine state columns
crom_data <- addCrom15Columns(data, "CpGId") 

if("FDR" %in% colnames(data) & "Bonferroni" %in% colnames(data))
{

   # Columns with chromatin status information :
   ChrStatCols <- c("TssA","TssAFlnk","TxFlnk","TxWk","Tx","EnhG",
                    "Enh","ZNF.Rpts","Het","TssBiv","BivFlnk",
                    "EnhBiv","ReprPC","ReprPCWk","Quies")

   if( !is.na(FDR) ) {
      chrom_states_fdr <- getAllChromStateOR( crom_data$bFDR, 
                                  crom_data[,ChrStatCols], 
                                  outputdir = "CromStates/OR_FDR", 
                                  outputfile = FilesToEnrich[i], 
                                  plots = TRUE )
      chrom_states_fdr_hyper <- getAllChromStateOR( FDR_Hyper, 
                                  crom_data[,ChrStatCols], 
                                  outputdir = "CromStates/OR_FDRHyper", 
                                  outputfile = FilesToEnrich[i], 
                                  plots = TRUE )
      chrom_states_fdr_hypo <- getAllChromStateOR( FDR_Hypo,
                                 crom_data[,ChrStatCols], 
                                 outputdir = "CromStates/OR_FDRHypo", 
                                 outputfile = FilesToEnrich[i], 
                                 plots = TRUE )
   }
   if ( BN == TRUE) {
      chrom_states_bn <- getAllChromStateOR( crom_data$Bonferroni, 
                                 crom_data[,ChrStatCols], 
                                 outputdir = "CromStates/OR_BN", 
                                 outputfile = FilesToEnrich[i], 
                                 plots = TRUE )
      chrom_states_bn_hyper <- getAllChromStateOR( BN_Hyper, 
                                 crom_data[,ChrStatCols], 
                                 outputdir = "CromStates/OR_BNHyper", 
                                 outputfile = FilesToEnrich[i], 
                                 plots = TRUE )
      chrom_states_bn_hypo <- getAllChromStateOR( BN_Hypo, 
                                crom_data[,ChrStatCols], 
                                outputdir = "CromStates/OR_BNHypo", 
                                outputfile = FilesToEnrich[i], 
                                      plots = TRUE )
   }
}

 

6.3.10 Specific molecular enrichments for placenta

 

In the case of working with placenta, several enrichments are performed. All they can be conducted applying a Fisher test or a Hypergeometric test, and also if Betas of the association is provided, then they are done for all CpGs as well as for hypo- and hyper-methylated CpGs.

 

\label{fig:enrichworkflowplacenta}Enrichment flowchart. Detailed Placenta enrichment

Figure 19: Enrichment flowchart
Detailed Placenta enrichment

 

A) ROADMAP fetal placenta chromatin states

Enrichment for fetal placenta chromatin states is based on the data released from ROADMAP:

  • FP_15_E091 (15 chromatin states)
  • FP_18_E091 (18 chromatin states)

 

## -- ROADMAP  -  Regulatory feature enrichment analysis - PLACENTA
## -----------------------------------------------------------------

# Convert to Genomic Ranges
data.GRange <- GRanges(
   seqnames = Rle(data$chr),
   ranges=IRanges(data$pos, end=data$pos),
   name=data$CpGs,
   chr=data$chromosome,
   pos=data$pos
)
names(data.GRange) <- data.GRange$name

# Find overlaps between CpGs and Fetal Placenta (States 15 and 18)
over15 <- findOverlapValues(data.GRange, FP_15_E091 )

if (enrichFP18 == TRUE){
   over18 <- findOverlapValues(data.GRange, FP_18_E091 )
   # Add states 15 and 18 to data.GRange file
   # and write to a file : CpGs, state15 and state18
   data.chrstates <- c(mcols(over15$ranges), over15$values, over18$values)
   colnames(data.chrstates)[grep("States",colnames(data.chrstates))] <-  
      c("States15_FP", "States18_FP")
} else {
   # Add states 15 to data.GRange file and write to a file : CpGs, state15
   data.chrstates <- c(mcols(over15$ranges), over15$values)
   colnames(data.chrstates)[grep("States",colnames(data.chrstates))] <-  
      c("States15_FP")
}

# Merge annotated data with chromatine states with states with data
crom_data <- merge(data, data.chrstates, by.x = "CpGs", by.y = "name" )

fname <- paste0("ChrSates_Pla_data/List_CpGs_",
                tools::file_path_sans_ext(basename(FilesToEnrich[i])),
                "_annot_plac_chr_states.txt")
dir.create("ChrSates_Pla_data", showWarnings = FALSE)
write.table( crom_data, fname, quote=F, row.names=F, sep="\t")

## --  Fisher Test - States15_FP - BN,  BN_hyper and BN_hypo 
## (Depletion and Enrichment)
States15FP_bn <- getAllFisherTest(crom_data$Bonferroni,
                               crom_data$States15_FP, 
                               outputdir = "ChrSates_15_Pla/Fisher_BN", 
                               outputfile = FilesToEnrich[i])
States15FP_bnhyper <- getAllFisherTest(BN_Hyper, 
                                 crom_data$States15_FP, 
                                 outputdir = "ChrSates_15_Pla/Fisher_BNHyper", 
                                 outputfile = FilesToEnrich[i])
States15FP_bnhypo <- getAllFisherTest(BN_Hypo, 
                                crom_data$States15_FP, 
                                outputdir = "ChrSates_15_Pla/Fisher_BNHypo", 
                                outputfile = FilesToEnrich[i])


## --  Plot collapsed data HyperGeometric Test - States15_FP - BN
plot_TestResults_Collapsed(list(bn = States15FP_bn, 
                                bn_hypo = States15FP_bnhypo, 
                                bn_hyper = States15FP_bnhyper),
                           outputdir = "ChrSates_15_Pla", 
                           outputfile = FilesToEnrich[i])

 

\label{fig:chr15pla1}Chromatine states 15 for placenta - Fisher test for Hyper and Hypo methylated CpGs

Figure 20: Chromatine states 15 for placenta - Fisher test for Hyper and Hypo methylated CpGs

 

B) Placenta partially methylated domains (PMDs)

Placenta partially methylated domains (PMDs) are large regions that cover approximately 37% of the human genome, which are partially methylated and contain placenta-specific repressed genes. Enrichment for placenta PMDs is based on the list provided in (Schroeder et al. 2013)

 

## -- Partially Methylated Domains (PMDs) PLACENTA
## ------------------------------------------------

# Create genomic ranges from PMD data
PMD.GRange <- getEnrichGenomicRanges(PMD_placenta$Chr_PMD, 
                                     PMD_placenta$Start_PMD, 
                                     PMD_placenta$End_PMD)

# Find overlaps between CpGs and PMD (find subject hits, query hits )
overPMD <- findOverlapValues(data.GRange, PMD.GRange )

#Create a data.frame with CpGs and PMDs information
mdata <- as.data.frame(cbind(DataFrame(CpG = data.GRange$name[overPMD$qhits]), 
                             DataFrame(PMD = PMD.GRange$name[overPMD$shits])))

# Merge with results from meta-analysis (A2)
crom_data <- merge(crom_data, mdata, by.x="CpGs", by.y="CpG",all=T)
# crom_data <- crom_data[order(crom_data$p.value),]

# CpGs with PMD as NA
PMD_NaN <- ifelse(is.na(crom_data$PMD),'IsNA','NotNA' )


## --  Fisher Test - PMD - BN,  BN_hyper and BN_hypo  
## (Full data ) (Depletion and Enrichment)
PMD_bn <- getAllFisherTest(crom_data$Bonferroni, 
                           PMD_NaN, 
                           outputdir = "PMD_Pla/Fisher_BN", 
                           outputfile = FilesToEnrich[i])
PMD_bnhyper <- getAllFisherTest(BN_Hyper, 
                                PMD_NaN, 
                                outputdir = "PMD_Pla/Fisher_BNHyper", 
                                outputfile = FilesToEnrich[i])
PMD_bnhypo <- getAllFisherTest(BN_Hypo, 
                               PMD_NaN, 
                               outputdir = "PMD_Pla/Fisher_BNHypo", 
                               outputfile = FilesToEnrich[i])

 ## --  Plot collapsed data HyperGeometric Test - States15_FP - BN
plot_TestResults_Collapsed(list(bn = PMD_bn, 
                                bn_hypo = PMD_bnhypo, 
                                bn_hyper = PMD_bnhyper),
                           outputdir = "PMD_Pla", 
                           outputfile = FilesToEnrich[i])

 

\label{fig:pmd15pla}Partial Metilated Domains for placenta - Fisher test for Hyper and Hypo methylated CpGs

Figure 21: Partial Metilated Domains for placenta - Fisher test for Hyper and Hypo methylated CpGs

 

C) Placenta imprinted regions

Placenta germline differentially methylated regions (gDMRs) are regions of the genome which show differential methylation according to the parental origin of the genome. They are associated with expression of imprinted genes, genes which are expressed in a parent-of-originspecific manner. Enrichment for placenta imprinted regions is based on the list provided in (Hamada et al. 2016)

 

## -- Imprinting Regions PLACENTA
## -------------------------------

# Create genomic ranges from DMR data
DMR.GRange <- getEnrichGenomicRanges(IR_Placenta$Chr_DMR, 
                                     IR_Placenta$Start_DMR, 
                                     IR_Placenta$End_DMR)

# Find overlaps between CpGs and DMR (find subject hits, query hits )
overDMR <- findOverlapValues(data.GRange, DMR.GRange )

#Create a data.frame with CpGs and DMRs information
mdata <- as.data.frame(cbind(DataFrame(CpG = data.GRange$name[overDMR$qhits]), 
                             DataFrame(DMR = DMR.GRange$name[overDMR$shits])))

# Merge with results from meta-analysis (A2)
crom_data <- merge(crom_data, mdata, by.x="CpGs", by.y="CpG",all=T)

# CpGs with DMR as NA
DMR_NaN <- ifelse(is.na(crom_data$DMR.y),'IsNA','NotNA' )

## --  Fisher Test - DMR - BN,  BN_hyper and BN_hypo  
## (Full data ) (Depletion and Enrichment)
DMR_bn <- getAllFisherTest(crom_data$Bonferroni,
                           DMR_NaN, 
                           outputdir = "DMR_Pla/Fisher_BN", 
                           outputfile = FilesToEnrich[i])
DMR_bnhyper <- getAllFisherTest(BN_Hyper, 
                                DMR_NaN, 
                                outputdir = "DMR_Pla/Fisher_BNHyper", 
                                outputfile = FilesToEnrich[i])
DMR_bnhypo <- getAllFisherTest(BN_Hypo, 
                               DMR_NaN, 
                               outputdir = "DMR_Pla/Fisher_BNHypo", 
                               outputfile = FilesToEnrich[i])

 ## --  Plot collapsed data HyperGeometric Test - States15_FP - BN
plot_TestResults_Collapsed(list(bn = DMR_bn, 
                                bn_hypo = DMR_bnhypo, 
                                bn_hyper = DMR_bnhyper),
                           outputdir = "DMR_Pla", outputfile = FilesToEnrich[i])

 

\label{fig:chr15pla}Imprinted Regions for placenta - Fisher test for Hyper and Hypo methylated CpGs

Figure 22: Imprinted Regions for placenta - Fisher test for Hyper and Hypo methylated CpGs

 

References

Fernandez-Jimenez, Nora, Catherine Allard, Luigi Bouchard, Patrice Perron, Mariona Bustamante, Jose Ramon Bilbao, and Marie-France Hivert. 2019. “Comparison of Illumina 450k and EPIC Arrays in Placental DNA Methylation.” Epigenetics 14 (12): 1177–82.
Hamada, Hirotaka, Hiroaki Okae, Hidehiro Toh, Hatsune Chiba, Hitoshi Hiura, Kenjiro Shirane, Tetsuya Sato, et al. 2016. “Allele-Specific Methylome and Transcriptome Analysis Reveals Widespread Imprinting in the Human Placenta.” The American Journal of Human Genetics 99 (5): 1045–58.
Maksimovic, Jovana, Johann A Gagnon-Bartsch, Terence P Speed, and Alicia Oshlack. 2015. “Removing Unwanted Variation in a Differential Methylation Analysis of Illumina HumanMethylation450 Array Data.” Nucleic Acids Research, gkv526.
Maksimovic, Jovana, Lavinia Gordon, and Alicia Oshlack. 2012. “SWAN: Subset-Quantile Within Array Normalization for Illumina Infinium HumanMethylation450 BeadChips.” Genome Biology 13 (6): R44.
Phipson, Belinda, Jovana Maksimovic, and Alicia Oshlack. 2015. “missMethyl: An r Package for Analysing Methylation Data from Illuminas HumanMethylation450 Platform.” Bioinformatics, btv560.
Phipson, Belinda, and Alicia Oshlack. 2014. “DiffVar: A New Method for Detecting Differential Variability with Application to Methylation in Cancer and Aging.” Genome Biology 15 (9): 465.
Schroeder, Diane I, John D Blair, Paul Lott, Hung On Ken Yu, Danna Hong, Florence Crary, Paul Ashwood, et al. 2013. “The Human Placenta Methylome.” Proceedings of the National Academy of Sciences 110 (15): 6037–42.
Schroeder, Diane I, and Janine M LaSalle. 2013. “How Has the Study of the Human Placenta Aided Our Understanding of Partially Methylated Genes?” Epigenomics 5 (6): 645–54.
Solomon, Olivia, Julie MacIsaac, Hong Quach, Gwen Tindula, Michael S Kobor, Karen Huen, Michael J Meaney, Brenda Eskenazi, Lisa F Barcellos, and Nina Holland. 2018. “Comparison of DNA Methylation Measured by Illumina 450k and EPIC BeadChips in Blood of Newborns and 14-Year-Old Children.” Epigenetics 13 (6): 655–64.
Zhou, Wanding, Peter W Laird, and Hui Shen. 2017. “Comprehensive Characterization, Annotation and Innovative Use of Infinium DNA Methylation BeadChip Probes.” Nucleic Acids Research 45 (4): e22–22.